Motivation and Objectives

In September, 2014, the Federal Bureau of Investigation (FBI) published a report, “A Study of Active Shooter Incidents in the United States Between 2000 and 2013,” starting that the incidence of school and mass shooting incidents has been drastically increasing since the turn of the century \(^{[1,2]}\). According to a recent study, on an average, the United States experiences approximately one mass killing every two weeks and one school shooting every month \(^{[4]}\). In fact, just on November 14th, 2019, there was a shooting at a high school in Santa Clarita, California where a 16-year-old opened fire at the school, killing 2 people and injuring 3 others \(^{[3]}\). Such an increasing trend has raised serious public concerns about school safety and questions the efficacy of the existing gun laws in different states. To prevent such traumatic incidents, state legislators should consider implementing effective gun policies that would bring notable reduction in school shootings. We believe that an in-depth analysis of the impact of gun policies of every state on school shooting may help convince state legislators to push to establish appropriate interventions. Thus, the primary aim of this study is to investigate the association between the strength of state gun laws with the incidence of school shootings.

Hypothesis Questions

School shootings are complex phenomena as they are intricately linked with various components such as: mental health, bullying, school safety measures, socioeconomic condition and, of course, availability of guns. The goal of the project is to investigate the impact of gun policies on the magnitude of school shooting rates in different states. The analysis is designed to provide valuable insights to state legislators about the impact of gun policies on school shooting incidents. In order to analyze school shooting in depth, we consider the following aspects of school shootings within each state over the time period:

  1. Overall impacts of gun policies
  2. Affects of bullying laws and prevalence of bullying
  3. Prevalence of mental illness
  4. Prevalence of bullying
  5. School environment
  6. Affects of media coverage
  7. Impact of hostility of school environment
  8. Impact of poverty prevalence rate
  9. Impact of specific gun laws

Data

School Shootings

The main dataset used in this analysis was obtained from the Center for Homeland Defense and Security website. It contains details about the school shootings incidents occurring in the United States from the year 1970 to 2019. The data come are open-source and monitored by a research team at the CHDS. They were gathered from different sources such as: newspaper articles, online news reports, court records or police reports, and statement/interview from law enforcement officials. Each row of the dataset corresponds to one specific incident. The dataset contains variables such as:

  • Location of incident
  • Age/gender/race of shooter
  • Number of victims
  • Number of wounded
  • Age of victim(s)
  • Time of day/day of week
  • Motive/category
  • History of bullying (of shooter)
  • History of domestic violence
  • Weapon type
  • etc.

The dataset also contains a variable named Reliability Score, corresponding to how the data were obtained. Reliability is scored on a 5-point scale classified by where the data were obtained from with the following scoring assignment:

  1. Independent Single Author/Moderator Blog, report/list lacking citations, or cited source cannot be located (e.g., newspaper headline title and story does not appear in searches)
  2. Single Newspaper Article, Online News Report (Network, Cable, Online)
  3. Multiple News Sources
  4. Hundreds of News Sources, Statement/Interview from Law Enforcement Official, News report on court ruling
  5. Court Records or Police Report

We determined that observations with a reliability score of 1 may not be accurate, and hence discarded these observations from further analysis. The dataset also has a variable indicating the categories/cause of the shooting incidences. The any observation marked as “Accidental” was also removed from the data before any subsequent analysis, though only a small proportion of the data fell into this category.

# variable to indicate if raw data should be imported or (if false) cleaned csv should be used
import_raw_data = FALSE

if(import_raw_data){
# read in data from local csv file
ss = read_csv("../../data_1/school_shootings.csv")

# first row has column names
colnames(ss) = ss[1,]

ss = ss[-1,]%>%
  clean_names()%>%
  select(-c(na,na_2))

# columns after row 1426 were shifter in original table
# read in second data frame to fix this
ss2 = read_csv("../../data_1/ss_pt2.csv")%>%
  select(-X1)
colnames(ss2)=colnames(ss)[34:ncol(ss)]

# update tail of dataframe
ss[(1426:nrow(ss)),(34:ncol(ss))]=ss2

# fix age class
ss$shooter_age = as.numeric(ss$shooter_age)

# create cleaned csv file
write.csv(ss,"ss_data.csv")
}

# ss is csv file with school shooting data (individual incidents) created above
ss = read_csv("ss_data.csv")%>%
  select(-X1)%>%
  mutate(year = as.numeric(substr(date, (nchar(date)-3),nchar(date))))%>%
  mutate(state = ifelse(state == "St. Croix, US Virgin Islands", 
                        "St.Croix",state))%>% #shorten name
  distinct_at(.vars = c(1:24), .keep_all = T)%>%
  filter(category != "Accident")%>% # remove accidents from dataset
  filter(reliability_score_1_5!=1) # remove unreliable data 
sts = unique(ss$state)
yrs = seq(range(ss$year)[1],range(ss$year)[2],by=1)

st_yr = data.frame(state = rep(sts,length(yrs)), year = sort(rep(yrs,length(sts))))%>%
  arrange(state)

State Grades (Gun Policy)

The main purpose of this analysis is to quantify the effect of gun policies on the number school shooting incidents for each state. We found a dataset that contains an annual gun law score for each state over the years 2012 to 2018. The dataset was obtained from the Gifford Law Center. For each state the annual score is graded on an A to F scale with higher grade indicating a state with more restrictive/strong gun policies. We chose to transform these grades into a GPA on a 4.0 scale for later model fitting in which numerical values are more interpretable than categorical variables.

# read in state grade data
if(import_raw_data){
# on csv file per year

# 2018
grades18 = read_csv("../../data_1/2018grades.csv")%>%
  select(-c(X6,X7))%>%
  clean_names()%>%
  select(-gun_death_rate_per_100k)%>%
  mutate(year = "2018")%>%
  rename(grade = x2018_grade)%>%
  select(state,grade,year)

# 2017
grades17 = read_csv("../../data_1/2017grades.csv")%>%
  clean_names()%>%
  mutate(year = "2017")%>%
  rename(grade = x2017_grade)%>%
  select(state,grade,year)

# 2016
grades16 = read_csv("../../data_1/2016grades.csv")%>%
  clean_names()%>%
  mutate(year = "2016")%>%
  rename(grade = x2016_grade)%>%
  select(state,grade,year)

# 2015
grades15 = read_csv("../../data_1/2015grades.csv")%>%
  clean_names()%>%
  mutate(year = "2015")%>%
  rename(grade = x2015_grade)%>%
  select(state,grade,year)
# add in missing value
grades15 = rbind(grades15,c("Kansas","F",2015))

# 2014
grades14 = read_csv("../../data_1/2014grades.csv")%>%
  clean_names()%>%
  mutate(year = "2014")%>%
  rename(grade = x2014_grade)%>%
  select(state,grade,year)

# 2013
grades13 = read_csv("../../data_1/2013grades.csv")%>%
  clean_names()%>%
  mutate(year = "2013")%>%
  rename(grade = x2013_grade)%>%
  select(state,grade,year)

# 2012
grades12 = read_csv("../../data_1/2012grades.csv")%>%
  clean_names()%>%
  mutate(year = "2012")%>%
  rename(grade = x2012_grade)%>%
  select(state,grade,year)

# combine all year data 
grades = bind_rows(grades18,grades17)%>%
  bind_rows(grades16)%>%
  bind_rows(grades15)%>%
  bind_rows(grades14)%>%
  bind_rows(grades13)%>%
  bind_rows(grades12)

# write cleaned csv file
write.csv(grades,"state_grades.csv")
}



grades = read_csv("state_grades.csv")%>%
  select(-X1)

gpa_convert = data.frame(letter_grade = c("A","A-","B+","B","B-",
                                          "C+","C","C-","D+","D","D-","F"),
                         gpa = c(4,3.7,3.3,3,2.7,2.3,2,1.7,1.3,1,0.7,0))

Gun Laws

This analysis also wishes to investigate the impact of specific types gun laws separately and thus also turned to the RAND dataset for more data. The dataset was obtained from the ______________ website. The data include information about the changes in laws regarding firearms for each state since the year 1979. The dataset contains variables with information about the type of the law, whether the change was restrictive or permissive, effective date, and end date. There are total 14 types of laws and each change is designated as either restrictive or permissive. For our analysis we count the number of laws in effect in each state and year. As some of the laws change in the middle of a year, the years are rounded to the nearest year (cutoff point of July 1st). Each permissive law is assigned the value -1 and each restrictive law is assigned as +1. Then, for each category the total number of permissive and restrictive laws are summed to obtain a final score for each state in each year. The different types of laws are further combined into the following four categories:

  1. Access laws: Minimum age, Prohibited Possessor, Child Access
  2. Screening law: Waiting period, Registration, Background checks, Dealer license, Safety training, permit, one gun per month
  3. Ban: Firearm ban
  4. Transport laws: Castle Doctrine, Open carrying not restricted, Carrying a concealed weapon
# Do not execute because final product is already saved as a CSV


if(FALSE){
  # import the raw data
data.rand=read.csv("rand.laws.database.csv",stringsAsFactors = FALSE)

# combine law types by a simple set of phrases which all laws contain exactly one of 

types=c("background check","carrying","castle doctrine","child access","dealer license","age","registration","waiting period","prohibited possessor","safety training","ban","permit","open carry","one gun per")
match.df=as.data.frame(unique(data.rand$Type.of.Law))
match.df <- data.frame(lapply(match.df,as.character),stringsAsFactors=FALSE)
names(match.df)<-"type"
match.df$abv.name=0

i=1
j=1
flag=0
while(i<=nrow(match.df)){
      j=1
      while(j<=length(types)&flag==0){
            if(any(grep(types[j],match.df$type[i],ignore.case=TRUE))){
                  match.df$abv.name[i]=types[j]
                  flag=1
                  }
            else{
                  j=j+1
            }
      }
      flag=0
      i=i+1
}

names(data.rand)[5]="type"
data.rand=left_join(data.rand,match.df,by="type")
names(data.rand)[23]<-"type.of.law"

# Only keep laws which have non-missing values for the type of change and effect, and remove DC

data.rand=filter(data.rand,Type.of.Change=="Implement"|Type.of.Change=="Modify"|Type.of.Change=="Repeal")
data.rand=filter(data.rand,Effect!="")
data.rand=filter(data.rand,State!="District of Columbia"&State!="DIstrict of Columbia")

# Determine the years in which the law is in effect by rounding the start and end dates to the nearest year. For laws which are not superceded by newer laws, set the end date to 2020 (our analyses do not extend to 2020 so this is equivalent to saying the law never ended).

data.rand$Supercession.Date[data.rand$Supercession.Date==""]="1/1/2020"
data.rand$end.year=year(mdy(data.rand$Supercession.Date))
data.rand$end.month=month(mdy(data.rand$Supercession.Date))
data.rand$start.year=year(mdy(data.rand$Effective.Date))
data.rand$start.month=month(mdy(data.rand$Effective.Date))
data.rand$start.year=data.rand$start.year+ceiling(data.rand$start.month/6)-1
data.rand$end.year=data.rand$end.year+ceiling(data.rand$end.month/6)-1
data.rand=clean_names(data.rand)

year=2007
blank.df=NULL
while(year<=2019){
      blank.df=data.rand%>%filter(start_year<=year,end_year>year)%>%count(state,type_of_law,effect)%>%cbind(year)%>%rbind(blank.df)
      year=year+1
}
skel=expand.grid(unique(data.rand$state),unique(data.rand$type_of_law),unique(data.rand$effect),2007:2019)
skel=as.data.frame(skel)

# count the number of laws in effect for each state by year by type of law and effect (permissive or restrictive)

names(skel)<-c("state","type_of_law","effect","year")
skel$state=as.character(skel$state)
rand.count.data=merge(skel,blank.df,by=c("state","type_of_law","effect","year"),all=TRUE)
rand.count.data[is.na(rand.count.data$n),"n"]=0
names(rand.count.data)[5]<-"count"
rand.count.data=unite(rand.count.data,"law_effect",c("type_of_law","effect"),sep="_")%>%spread("law_effect","count")%>%clean_names()
write.csv(rand.count.data,"cleaned.rand.csv")
}


# data set to go from state names to abbreviations
states = data.frame(state.name,state.abb)
dc = data.frame(matrix(ncol= 2, nrow = 1))
colnames(dc)=colnames(states)
dc$`state.name` = "District of Columbia"
dc$`state.abb` = "DC"

states = rbind(states,dc)
states$state.name = as.character(states$state.name)


rand.count.data = read_csv("cleaned.rand.csv")%>%
  select(-X1)%>%
  left_join(states, by = c("state"= "state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)

Mental Health

The literature review shows strong influence of mental illness on school shooting occurrences. Thus we collected a dataset that contains the prevalence of youths with severe mental disorder by state. The dataset was obtained from_____________. The dataset contains the percentage of youths who have been diagnosed with major depression but who did not receive any mental health treatment in each state over the years 2010 to 2016.

# cleaning mental health data. commented out because it is already saved as a csv file

if(FALSE){
# read in each year's csv file
data.2016=read.csv("mha2016.csv",stringsAsFactors = FALSE)
data.2017=read.csv("mha2017.csv",stringsAsFactors = FALSE)
data.2018=read.csv("mha2018.csv",stringsAsFactors = FALSE)
data.2019=read.csv("mha2019.csv",stringsAsFactors = FALSE)
data.2020=read.csv("mha2020.csv",stringsAsFactors = FALSE)

names(data.2017)[2]<-"State"

data.2016=data.2016[,c("State","Percent","Year")]
data.2017=data.2017[,c("State","Percent","Year")]
data.2018=data.2018[,c("State","Percent","Year")]
data.2019=data.2019[,c("State","Percent","Year")]
data.2020=data.2020[,c("State","Percent","Year")]
states=sort(unique(data.2020$State))
# Remove DC
states=states[-9]
# Find which states don't match up for each year

states==sort(unique(data.2016$State))
sort(unique(data.2016$State))[8]
# remove DC from 2016 data
data.2016=filter(data.2016,State!="DC")      
states==sort(unique(data.2016$State))
data.2016$State[35]="Louisiana"
states==sort(unique(data.2016$State))
data.2017=filter(data.2017,State!="District of Columbia")
states==sort(unique(data.2017$State))
data.2018=filter(data.2018,State!="District of Columbia")
data.2018$State[9]="Wyoming"
states==sort(unique(data.2018$State))
data.2019=filter(data.2019,State!="District of Columbia"&State!="")
states==sort(unique(data.2019$State))

mh.data=rbind(data.2016,data.2017,data.2018,data.2019,data.2020)
mh.data$Year=mh.data$Year-1      
mh.data$Percent[51]=42.1
mh.data=filter(mh.data,State!="District of Columbia")
write.csv(mh.data,"mh.data.csv")
}



mh.data=read.csv("mh.data.csv",stringsAsFactors = FALSE)%>%
  clean_names()%>%
  select(-x)%>%
  left_join(states, by = c("state"="state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)

CDC Survey

The literature reviews stipulate that mental health, school satiety and bullying all play a vital roles in school shootings. To analyze the dynamics of school shootings, we gathered data from a CDC survey given to students in 37 states biannually from 1991 to 2015. The data were obtained from the R package yrbss. The dataset includes proportions of students answering “yes” to a number of questions. We selected and include in the analysis the following questions:

  1. Carried a weapon
  2. Carried a gun
  3. Carried a weapon on school property
  4. Did not go to school because they felt unsafe at school or on their way to or from school
  5. Were threatened or injured with a weapon on school property
  6. Were in a physical fight
  7. Were injured in a physical fight
  8. Were in a physical fight on school property
  9. Were bullied on school property
  10. Were electronically bullied
  11. Felt sad or hopeless
  12. Seriously considered attempting suicide
  13. Made a plan about how they would attempt suicide
  14. Attempted suicide
  15. Attempted suicide that resulted in an injury, poisoning, or overdose that had to be treated by a doctor or nurse
  16. Been teased because of weight in the past 12 months
  17. Been teased because of being labeled GLB in the past 12 months

Because many of these questions cover similar topics, we choose to combine the variables into four main categories and take weighted averages (weight by the number of respondents) to obtain a final score for each category. The categories are classified as follows:

  1. Suicidal thoughts or actions (measure of mental health):
    • Felt sad or hopeless, Seriously considered attempting suicide
    • Made a plan about how they would attempt suicide
    • Attempted suicide
    • Attempted suicide that resulted in an injure, poisoning, or overdose.
  2. School safety :
    • Carried a weapon
    • Carried a gun
    • Carried a weapon on school property
    • Did not go to school because they felt unsafe at school or on their way to or from school
    • Were threatened or injured with a weapon on school property.
  3. Involvement in physical altercations:
    • Were in a physical fight
    • Were injured in a physical fight
    • Were in a physical fight on school property
  4. Bullying:
    • Were bullied on school property
    • Were electronically bullied
    • Been teased b/c of weight past 12 months
    • Been teased b/c labeled GLB past 12 months

Data for questions regarding bullying were not present in the dataset prior to 2010. We assume the surveys were updated to include measures of bullying after the initial writing of the survey. The dataset also only contains data for every other year. To get data for the missing years, we impute the missing observations with the mean of the values in years immediately preceding and following the missing observation. As the dataset only contains data up to the year 2015, we impute the values for the years after 2015 years by predicting a linear trajectory from 2014 and 2015. This was done after exploratory data analysis showed that proportions were relatively stable across years in a given state. To impute values for states not present in the dataset, we use the median score across all states accounted for in the data for each year (we chose to use the median after seeing that the mean and the median values in each year were nearly identical).

# questions 
qns = c("qn13","qn14","qn15","qn16","qn17","qn18","qn19","qn20","qn24","qn25","qn26","qn27","qn28","qn29","qn30","qnbullyweight","qnbullygay")
# paricitpating states
sts = yrbss::getListOfParticipatingStates()
# years in data
yrs = c(1991,1993,1995,1997,1999,2001,2003,2005,2007,2009,2011,2013,2015)

if(import_raw_data){
  # make data frame to store values
bullying = data.frame(variable = NULL, state = NULL, year = NULL, prop= NULL, ciLB= NULL, ciUB = NULL, n = NULL)

# run over all questions, states, and years
for(v in qns){
  for(s in sts){
    for(y in yrs ){
      p = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$prop
    ciL = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$ciLB
    ciU = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$ciUB
    sample_size = yrbss::getProportionSingleVariable(.variable = v, .location = s, .year = y, .level = 0.95)$n
    newrow = data.frame(variable = v, state = s, year = y, prop= p, ciLB= ciL, ciUB = ciU, n = sample_size)
    bullying = bind_rows(bullying,newrow)}
}}

# incorporate labels for questions (what the question actually is)
lab = data.frame(question = yrbss::yrbss_questions_binary$variable, label = yrbss::yrbss_questions_binary$label)%>%
   filter(question %in% qns)
 
bullying = left_join(bullying,lab, by = c(variable = "question"))
# clean Arizona label
bullying$state[which(bullying$state=="AZB")]="AZ"

# write cleaned csv file
write_csv(bullying, "bullying_survey.csv")
}

# incorporate labels for questions (what the question actually is)
lab = data.frame(question = yrbss::yrbss_questions_binary$variable, label = yrbss::yrbss_questions_binary$label)%>%
   filter(question %in% qns)

# read in csv
bullying_survey = read_csv("bullying_survey.csv")


# Make combined variables for survey questions (specified in text above)

# combine suicide questions 
suicide = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
suicide_qns = bullying_survey%>%
  filter(variable %in% c("qn26","qn27","qn28","qn29","qn30"))

sts[which(sts=="AZB")]="AZ"
for(y in yrs){
  for(st in sts){
    dat = filter(suicide_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    suicide_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = suicide_score, n = samp)
    suicide = bind_rows(suicide,newrow)
  }
}

# combine questions involving physical fights
fight = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
fight_qns = bullying_survey%>%
  filter(variable %in% c("qn18","qn19","qn20"))

for(y in yrs){
  for(st in sts){
    dat = filter(fight_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    fight_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)),, na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = fight_score, n = samp)
    fight = bind_rows(fight,newrow)
  }
}

# school safety questions
safety = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
safety_qns = bullying_survey%>%
  filter(variable %in% c("qn13","qn14","qn15","qn16","qn17"))

for(y in yrs){
  for(st in sts){
    dat = filter(safety_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    safety_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = safety_score, n = samp)
    safety = bind_rows(safety,newrow)
  }
}

# bullying questions
bull = data.frame(year = NULL, state = NULL, score = NULL, n = NULL)
bully_qns = bullying_survey%>%
  filter(variable %in% c("qn24","qn25","qnbullyweight","qnbullygay"))

for(y in yrs){
  for(st in sts){
    dat = filter(bully_qns,(year ==y & state == st))
    # score is weighted average of proportions (weighted by number of respondents)
    bully_score = weighted.mean(dat$prop,(dat$n/sum(dat$n)), na.rm = TRUE)
    samp = mean(dat$n,na.rm = TRUE)
    newrow = data.frame(year = y, state = st, score = bully_score, n = samp)
    bull = bind_rows(bull,newrow)
  }
}



# create table with new variables
survey_combined = bind_cols(suicide,safety, fight, bull)%>%
  select(year, state, score, n, score1, n1, score2, n2, score3, n3)%>%
  rename(suicide_score = score, school_safety_score = score1, fight_score = score2, bully_score = score3)%>%
  filter(year>=2007)%>% 
  full_join(st_yr%>%filter(year>=2007))%>%
  arrange(year)%>%
  arrange(state)%>%
  mutate(state = ifelse(state == "AZB","AZ",state))


# impute values
# average value of previous and next year (for years with such data)
# for 2016-2019 predict value from line between 2014 and 2015 points
survey_combined = survey_combined%>%
  group_by(state)%>%
  mutate(suicide_score =ifelse(year<2016,ifelse(is.na(suicide_score), na.locf(suicide_score)/2+na.locf(suicide_score, fromLast = TRUE)/2,suicide_score),
                              NA ))%>%
  mutate(school_safety_score =ifelse(year<2016,ifelse(is.na(school_safety_score), na.locf(school_safety_score)/2+na.locf(school_safety_score, fromLast = TRUE)/2,school_safety_score),NA))%>%
    mutate(fight_score =ifelse(year<2016,ifelse(is.na(fight_score), na.locf(fight_score)/2+na.locf(fight_score, fromLast = TRUE)/2,fight_score),NA))%>%
      mutate(bully_score =ifelse(year<2016,ifelse(is.na(bully_score), na.locf(bully_score)/2+na.locf(bully_score, fromLast = TRUE)/2,bully_score),NA))%>%
        mutate(n=ifelse(year<2016,ifelse(is.na(n), na.locf(n)/2+na.locf(n, fromLast = TRUE)/2,n),NA))%>%
          mutate(n1=ifelse(year<2016,ifelse(is.na(n1), na.locf(n1)/2+na.locf(n1, fromLast = TRUE)/2,n1),NA))%>%
          mutate(n2=ifelse(year<2016,ifelse(is.na(n2), na.locf(n2)/2+na.locf(n2, fromLast = TRUE)/2,n2),NA))%>%       
  mutate(n3=ifelse(year<2016,ifelse(is.na(n3), na.locf(n3)/2+na.locf(n3, fromLast = TRUE)/2,n3),NA))


# use fitted line to get 2016-2019 values 
for(s in sts[which(!(sts %in% c("MA","AZB","DC")))]){
  dat = filter(survey_combined, state ==s)
  ind = which(is.na(dat$suicide_score))[1]
  slp = dat[(ind-1),3:10]-dat[(ind-2),3:10]
  dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp  
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp  
    dat[which(is.na(dat$suicide_score))[1],3:10] =  dat[(ind-1),3:10]+as.numeric(dat[which(is.na(dat$suicide_score))[1],1]-2015)*slp

  survey_combined[which(survey_combined$state == s),]=dat
}

# chech to see if mean and median are different
med_mn = survey_combined%>%
  group_by(year)%>%
  summarise(med_ss = median(suicide_score, na.rm = T), mn_ss = mean(suicide_score, na.rm = T),
            med_safe = median(school_safety_score, na.rm = T), mn_safe = mean(school_safety_score, na.rm = T),
            med_f = median(fight_score, na.rm = T), mn_f = mean(fight_score, na.rm = T),
            med_b = median(bully_score, na.rm = T), mn_b = mean(bully_score, na.rm = T))

# impute missing values with median value for each column

survey_combined = survey_combined%>%
  ungroup()%>%
  group_by(year)%>%
  mutate(suicide_score = ifelse(is.na(suicide_score), median(suicide_score, na.rm = T),suicide_score))%>%
  mutate(school_safety_score = ifelse(is.na(school_safety_score), median(school_safety_score, na.rm = T),school_safety_score))%>%
  mutate(fight_score = ifelse(is.na(fight_score), median(fight_score, na.rm = T),fight_score))%>%
  mutate(bully_score = ifelse(is.na(bully_score), median(bully_score, na.rm = T),bully_score))%>%
  mutate(n = ifelse(is.na(n), median(n, na.rm = T),n))%>%
  mutate(n1 = ifelse(is.na(n1), median(n1, na.rm = T),n1))%>%
  mutate(n2 = ifelse(is.na(n2), median(n2, na.rm = T),n2))%>%
  mutate(n3 = ifelse(is.na(n3), median(n3, na.rm = T),n3))%>%
  mutate(school_safety_score = ifelse(school_safety_score<0,0,school_safety_score))%>%
  mutate(fight_score = ifelse(fight_score<0,0,fight_score))

Poverty

The financial status of populations within each state may also have an influence on the number of school shootings that took place each year. Thus, to account for this, we include the poverty prevalence rate for each state over the study period measure by thee percent of the population living below the poverty line. The dataset was obtained from __________________. The dataset contains poverty rate and 90% confidence interval of poverty rate for each state from the year 2007 to 2018.

# read in poverty data

pov=read.csv("census_poverty_data.csv",stringsAsFactors = FALSE)
pov=filter(pov,State!="      United States"&State!="     United States"& State!="District of Columbia"&State!="United States")

names(pov)[1:2]<-c("state","poverty")
names(pov)[4]<-"year"

skel=expand_grid(sort(unique(pov$state)),min(pov$year):(max(pov$year)+2))
names(skel)<-c("state","year")


# Each data point is a three-year average. We assign this average to all years which were combined to make the three year average
sk1=filter(skel,year==2007|year==2008|year==2009)
sk1=left_join(sk1,filter(pov,year==2007),by="state")

sk2=filter(skel,year==2010|year==2011|year==2012)
sk2=left_join(sk2,filter(pov,year==2010),by="state")

sk3=filter(skel,year==2013|year==2014|year==2015)
sk3=left_join(sk3,filter(pov,year==2013),by="state")

sk4=filter(skel,year==2016|year==2017|year==2018)
sk4=left_join(sk4,filter(pov,year==2016),by="state")

pov=rbind(sk1,sk2,sk3,sk4)
pov=pov[,1:3]
names(pov)<-c("state","year","poverty")

Bullying Laws

Bullying in school has been identified as playing a significant role in school shootings. Some states have addressed this issue by implementing policy to prevent bullying in schools. Though almost all states have some form of legislation aimed at preventing bullying, the strength of these laws varies from state to state. We obtained a dataset that indicates how restrictive the bullying laws are in different states. The dataset was retrieved from stopbullying.gov. The dataset mentions 14 different components of anti-bullying laws that each state may or may not have incorporated into their legislation. Thus, we calculate a score out of 14 for each state indicating how many components they have incorporated. Unfortunately, the data was available only for the year 2010. Thus we have included the dataset in the analysis as a state effect that does not vary over time (assuming it either stays the same or changes equally across states).

bully = read_csv("State_bullying.csv")%>%
  select(c(total_score,state.abb))%>%
  rename(state = state.abb)%>%
  mutate(year = 2010)

Media

As the literature review indicates, we also wish to consider the impact of media coverage on school shooting rates. A dataset was retrieved from a Nature article, which details media coverage and firearm acquisition after mass shootings (data available on the following GitHub repository). The dataset reports the number of articles about shootings published in the Washington Post and The New York Times as well as the number of mass shootings in the U.S. for every month starting in January, 1999 and ending in December, 2017. The articles are further clustered on the basis of the topics the articles covered such as: “firearm laws and regulations”, “unemployment,” and “shootings excluding firearm laws and regulations”. We use the number of articles about shootings excluding articles about laws and regulations as a variable in our analysis. In our analysis, we consider this variable with a time-lag, assuming (after our literature review) that the number of articles in the year leading up to an event may have had an influence on the occurrence of that event.

media = read_csv("media_data.csv")%>%
  separate(`Year-Month`, into = c("year","month"), sep = "-")%>%
  clean_names()%>%
  select(-starts_with("x"))%>%
  group_by(year) %>%
  mutate(yearly_articles = sum(articles_shootings_excluding_firearm_laws_and_regulations))%>% 
  mutate(yearly_shootings = sum(mass_shooting))%>%
  mutate(art_per_inc = yearly_articles/yearly_shootings)%>%
  ungroup()%>%
  mutate(prev_month_art = c(0,articles_shootings_excluding_firearm_laws_and_regulations[1:227]))%>%
  mutate(prev_year_art = c(rep(0,12),yearly_articles[1:216]))%>%
  mutate(year = as.numeric(year))

Population

It is well known that states vary widely in terms of population size. Because of this–and because we are looking at event data with number of events as our outcome of interest–we must take population into account in any analysis we do, since states with much higher populations are far more likely to experience ea higher number of vents than smaller states. We obtained state population data from the U.S. Census Bureau. The dataset contains the population of each state from 1970 to 2019.

if(import_raw_data){

# 1980-1990
pop8090_raw = read.delim("https://www2.census.gov/programs-surveys/popest/tables/1980-1990/state/asrh/st8090ts.txt" )%>%data.frame()
pop8090_raw = pop8090_raw[c(8:58,64:114),]%>%data.frame()
colnames(pop8090_raw)="var"
pop8090 = data.frame(pop8090_raw$var[1:51],pop8090_raw$var[52:102])

colnames(pop8090)=c("var1","var2")


pop8090 = pop8090%>%
  separate(var1, c("state",as.character(seq(1980,1984,by = 1))), extra = "merge")%>%
  separate(var2, c("state1",as.character(seq(1985,1990,by = 1))), extra = "merge")%>%
  select(c(state,starts_with("19")))
  
# 1970 - 1980
pop7080_raw = read.delim("https://www2.census.gov/programs-surveys/popest/tables/1980-1990/state/asrh/st7080ts.txt")
pop7080_raw = pop7080_raw[c(10:61,63:114),]%>%data.frame()
colnames(pop7080_raw)="var"
pop7080 = data.frame(pop7080_raw$var[1:51],pop7080_raw$var[53:103])

colnames(pop7080)=c("var1","var2")

pop7080 = pop7080%>%
  separate(var1, c("blnk","id","state","1970","1971","1972","1973","1974","1975"), extra = "merge")%>%
  separate(var2, c("blnk1","id1","state1",as.character(seq(1976,1980,by = 1))), extra = "merge")%>%
  select(c(state,starts_with("19")))

# 1990-2000
pop90_00 = read_csv("../../data/pop_90-00.csv")%>%
  rename(state = X1)%>%
  select(c(state,3:13))%>%
  filter(state != "USA")
colnames(pop90_00)[2:ncol(pop90_00)]=as.character(seq(1990,2000,by = 1))
pop90_00$state[30:35] = states$state.name[29:34]
pop90_00$state[40:42] = states$state.name[39:41]
pop90_00$state[9] = states$state.name[51]
pop90_00$state[49] = states$state.name[48]

# 2000-2010
pop00_10 = read.csv("https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/state/st-est00int-agesex.csv")%>%
  clean_names()%>%
  filter(sex == 0, age == 999)%>%
  select(c(name,starts_with("popestimate")))%>%
  rename_at(vars(starts_with("popestimate")),funs(substr(.,start = 12, stop = 15)))%>%
  rename(state = name)%>%
  mutate(state = as.character(state))%>%
  mutate(state = ifelse(state=="District of Columbia",as.character(states$state.name[51]),state))%>%
  filter(state != "United States")

# 2010-2018
pop10_18 = read_csv("../../data/pop_2010-2018.csv")
colnames(pop10_18)=pop10_18[1,]

pop10_18 = pop10_18%>%
  slice(-1)%>%
  rename_at(vars(starts_with("Pop")), funs(substr(.,start = 38,stop = 41)))%>%
  rename(state = Geography)%>%
  select(c(state,starts_with("20")))%>%
  mutate(state = ifelse(state=="District of Columbia",as.character(states$state.name[51]),state))

# 2019

pop2019 = read_csv("../../data/2019pop.csv")%>%
  clean_names()%>%
  select(c(state,pop))%>%
  rename("2019"=pop)


# combine all data
pops = left_join(pop7080,pop8090%>%select(-c("1980")), by = "state")%>%
  left_join(states,by = c("state"="state.abb"))%>%
  mutate(state = state.name)%>%
  select(-state.name)%>%
  left_join(pop90_00%>%select(-c("1990")), by = "state")%>%
  left_join(pop00_10%>%select(-c("2000")), by = "state")%>%
  left_join(pop10_18%>%select(-c("2010")), by = "state")%>%
  left_join(pop2019,by = "state")%>%
  gather(key = "year", value = "population", c(2:51))

# write cleaned csv file
write.csv(pops, "state_populations.csv")

}

# read in population data (include abbreviated and full state name)
pops = read_csv("state_populations.csv")%>%
  select(-X1)%>%
  left_join(states, by = c("state"="state.name"))

Aggregating the data

Because of the nature of the school shooting data (each row is one event), we must aggregate the data into a form that we can analyze. We create a new data set with each row representing the data for one state in a given year, with number of school shootings in that state/year as the main outcome of interest. We then merge all other data sets to get the final form of the data we will use for our remaining analysis.

# data set aggrgated by state and year
ss_counts = ss%>%
  group_by(state,year) %>%
  summarize(incident_count = n(), 
            total_victims = sum(total_injured_killed_victims), #include possible secondary outcomes
            total_fatalities = sum(killed_includes_shooter),
            total_wounded = sum(wounded),
            avg_victims = mean(total_injured_killed_victims),
            avg_fatalities = mean(killed_includes_shooter),
            ave_wounded = mean(wounded),
            avg_shooter_age = mean(shooter_age),
            max_shooter_age = max(shooter_age),
            min_shooter_age = min(shooter_age),
            avg_reliability = mean(reliability_score_1_5),
            target = mean(ifelse(targeted_specific_victim_s=="Y",1,
                                 ifelse(targeted_specific_victim_s=="N",0,NA)),na.rm = T),
            random_vict = mean(ifelse(random_victims=="Y",1,
                                      ifelse(random_victims=="N",0,NA)),na.rm = T),
            bullied = mean(ifelse(bullied_y_n_n_a=="Y",1,
                                  ifelse(bullied_y_n_n_a=="N",0,NA)),na.rm = T),
            domestic_violence = mean(ifelse(domestic_violence_y_n=="Y",1,
                                            ifelse(domestic_violence_y_n=="N",0,NA)),na.rm = T))%>%
  mutate(in_ss = TRUE)%>%
  full_join(st_yr, by = c("state","year"))%>%
  left_join(pops%>%rename(full_state_name = state), by = c("state"="state.abb", "year"))%>%
  mutate(incident_count = ifelse(is.na(incident_count),0,incident_count))%>%
  mutate(in_ss = ifelse(is.na(in_ss),FALSE,in_ss))

# merge in grade data
ss_counts = ss_counts%>%
  left_join(grades%>%select(state,grade,year),by = c("full_state_name"="state","year"))%>%
  left_join(gpa_convert, by = c("grade"="letter_grade"))

# merge in media data
ss_counts = ss_counts%>%
  left_join(media%>%select(prev_year_art, year,yearly_articles)%>%distinct(), by = "year")

# merge in poverty data
pov = left_join(pov,states, by = c("state"="state.name"))%>%
  select(-state)%>%
  rename(state = state.abb)
ss_counts = left_join(ss_counts, pov, by = c("state","year"))

# merge bullying data
ss_counts = left_join(ss_counts, bully%>%select(-year), by = "state")%>%
  rename(bullying_score = total_score)%>%
  mutate(bullying_score = as.numeric(bullying_score))

# merge mental health data
ss_counts = ss_counts%>%
  left_join(mh.data, by = c("state","year"))%>%
  rename(mh_score = percent)

# merge CDC survey data
ss_counts = left_join(ss_counts, survey_combined, by = c("state","year"))%>%
  filter(state != "DC")

# merge RAND data
ss_counts = left_join(ss_counts,rand.count.data, by = c("state","year"))

# create new variables of law categories
# -1s are from after realizaing I accidentally marked permissive laws as positive and and restrictive laws as negative... whoops :) 
ss_counts = ss_counts%>%
  mutate(access_laws = -1*(age_permissive-age_restrictive+child_access_permissive-child_access_restrictive+prohibited_possessor_permissive-prohibited_possessor_restrictive))%>%
  mutate(transport= -1*(castle_doctrine_permissive-castle_doctrine_restrictive+open_carry_permissive-open_carry_restrictive+carrying_permissive-carrying_permissive))%>%
  mutate(screening = -1*(background_check_permissive- background_check_restrictive + waiting_period_permissive- waiting_period_restrictive + permit_permissive - permit_restrictive + registration_permissive - registration_restrictive + safety_training_permissive - safety_training_restrictive + dealer_license_permissive - dealer_license_restrictive + one_gun_per_permissive - one_gun_per_restrictive))%>%
  mutate(ban = -ban_permissive+ban_restrictive)%>%
  select(-c(36:63))

EDA

Once all data have been obtained, cleaned, and merged, we next turn to making exploratory plots to understand the nature of the data and look for trends that will be important in later modeling and statistical analysis. First, we look at the trend of school shootings over time by plotting the number of incidents per year. We can see that the general trend is that school shooting incidents have been steadily increasing over time with a huge spike in the latter half of the past decade. *Note that the last date recorded in our data is November 11th, 2019 and hence we do not have a full year of data for 2019.

ss_counts%>%
  group_by(year)%>%
  summarise(ss = sum(incident_count))%>%
  ggplot(aes(x = year, y = ss))+
  geom_line(color = "red")+
  labs(x = "Year", y = "School Shootings", title = "School Shootings in the U.S.")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5))

ss_counts%>%
  mutate(grade = substr(grade,1,1))%>% # get rid of plus/minus grades (round to letter grade)
  group_by(grade)%>%
  summarise(count = sum(incident_count), # total incidents in states with same grade (all years)
            pop = sum(population))%>% # total population in states with same grade (all years)
  filter(!is.na(grade))%>% # only years with grade value
  ggplot(aes(x = grade, y = (count/pop)*330000000, fill = grade))+
  geom_bar(stat = "identity")+
  scale_fill_viridis_d(direction = -1)+
  theme_classic()+
  labs(y = "Normalized School Shooting Count", x = "Grade", title = "School Shootings by Gun Policy Strength")+
  theme( legend.position = "none", plot.title = element_text(hjust = 0.5))

Gun Policy by State

We next look at how these school shootings occur with respect to gun policy. Below is a plot showing the number of school shootings in each state from the years 2012-2018, where each state is colored by the gun policy grade it received that year (we took grades as letter grades without +/-). As we can see, in the early years it seems that the states with poor gun policy grades account for the most school shootings. Interestingly, in 2018 there seems to be a trend reversal, with states receiving high grades experiencing a high number of school shootings.

# This will handle the ordering
d <- ss_counts %>% 
  filter(year %in% seq(2012,2018,by=1))%>%
  mutate(grade = substr(grade,1,1))%>%
  ungroup() %>%   # As a precaution / handle in a separate .grouped_df method
  arrange(year, grade) %>%   # arrange by facet variables and continuous values
  mutate(.r = row_number()) # Add a row number variable

ggplot(d, aes(x = .r, y= incident_count, fill = grade)) +  # Use .r instead of x
  geom_point(data = d%>%filter(incident_count==0),
             aes(x = .r, y= incident_count, color = grade),
             size = 0.7) +
  geom_bar(stat = "identity")+
  facet_wrap(~ year, scales = "free_x") +  # Should free scales (though could be left to user)
  scale_x_continuous(  # This handles replacement of .r for x
    breaks = d$.r,     # notice need to reuse data frame
    labels = d$state
  )+
  labs(title = "Incident Count by Grade", x = "State", y = "School Shootings")+
  scale_fill_viridis_d( name = "Grade", direction = -1)+
  scale_color_viridis_d( guide = FALSE, direction = -1)+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 3.5), legend.position = c(0.7,.15), legend.direction = "horizontal",legend.key.size = unit(1.5,"line"), legend.text = element_text(size = 8), legend.title = element_text(size = 10),plot.title = element_text(hjust = 0.5))

We next look at how state gun policy grades and number of gun laws in effect (by category) change over time. We plot the grades on a 4.0 GPA scale. Gun laws in effect is calculated for each state and year as \(# Restrictive laws - # Permissive laws\). In these animations we can see that while some states’ grades and laws experience large changes over time, many do not (this can be seen particularly well in ban and transport laws that seem relatively stable over the time period 2007-2019).

# variable to create animations (takes a few seconds so I saved files in the directory rather than building them each time I knit the code)
build_animations = FALSE

if(build_animations){
# convert letter grade to gpa
grades = left_join(grades, gpa_convert, by = c("grade"="letter_grade"))

# get long/lat data
dat_loc= map_data("state")%>%
  mutate(sts=toupper(region))

# merge data
b=full_join(grades%>%mutate(state = tolower(state)), dat_loc, by=c("state"="region"))
b = b%>%mutate(year = as.numeric(year))%>%filter(!is.na(year))%>%select(-subregion)
b = b[complete.cases(b),]

# create plot to animate
map_plts = ggplot(data = b, aes(x = long, y = lat, group = group))+
  geom_polygon(aes(fill = gpa), color = "white")+
  scale_fill_viridis(option = "D", name = "GPA")+
  labs(title = "Grade Changes by Year", subtitle = 'Year: {frame_time}', x = NULL, y = NULL)+
  theme_bw()+
  theme(line = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), plot.title = element_text(hjust = 0.5))+
  transition_time(year, range = c(2012,2018))

# run animation
animate(map_plts, nframes = 7, fps = 1)

# save as file to save time when running next time
anim_save("state_grade_animation.gif")
}
if(build_animations){
# long/lat data
dat_loc = left_join(dat_loc,states%>%mutate(state.name = tolower(state.name)), by = c("region"="state.name"))
b2=full_join(ss_counts%>%ungroup()%>%filter(year>=2007)%>%filter(!(state%in%c("AK","HI")))%>%select(state,year,access_laws,transport,screening,ban), dat_loc, by=c("state"="state.abb"))%>%
  select(-subregion)%>%
  gather(key = "law_class", value = "number_in_effect", c(3:6))
b2 = b2[complete.cases(b2),]

# facet labels for cleaner plots
law.labs <- c("Access Laws", "Bans","Screening Laws", "Transport Laws")
names(law.labs) <- c("access_laws", "ban","screening","transport")

# create plots for animation
map_plts_law = ggplot(data = b2, aes(x = long, y = lat, group = group))+
  geom_polygon(aes(fill = number_in_effect), color = "white")+
  scale_fill_viridis(option = "D", name = "Laws in\nEffect\n(R-P)")+
  labs(title = "Law Changes by Year",subtitle = 'Year: {frame_time}', x = NULL, y = NULL)+
  theme_bw()+
  facet_wrap(.~law_class, labeller = labeller(law_class = law.labs))+
  theme(line = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), plot.title = element_text(hjust = 0.5))+
  transition_time(year, range = c(2007,2019))

# run animation 
animate(map_plts_law, nframes = 13, fps = 1)

# save file
anim_save("state_law_animation.GIF")
}

Media Coverage

Another variable we wish to control for is media coverage of shooting events in the United States. Below we see that as the years have progressed, there has been a general increase in the number of articles published about shootings. Here we are using the number of articles in the New York Times and Washington Post as a proxy for general media coverage trends. We hypothesize that the increase in media coverage of mass shootings may help explain the increase in school shootings, as increased public awareness of these shootings may unintentionally normalize these events or even inspire further violence, as is indicated in our literature review. Hence, we also plot the number of school shootings colored by the number of articles about shootings published in the previous year. If the aforementioned hypothesis is true, we would expect to see higher bars (indicating more school shootings) in years in which the previous year had a lot of articles published about shootings. While there does seem to be some trend in the data, it is not particularly clear if this is the case or not.

# plot number of artciles per month over time 
media%>%
    mutate(date = as.Date(zoo::as.yearmon(paste(year,"-",month, sep = ""))))%>%
  ggplot(aes(y = articles_shootings_excluding_firearm_laws_and_regulations, x= date, color = year))+
  geom_point()+
  geom_jitter()+
  labs(y = "Monthly Articles",x =  "Year", title = "Media Coverage of Shootings", subtitle = "Articles in NYT and Washington Post Regarding Shootings")+
  scale_color_viridis_c(name = "Year")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5, size = 8))

# plot school shooting counts per year colored by number of articles in previous year
ss_counts%>%
  group_by(year)%>%
  summarise(yearly_incidents = sum(incident_count), prev_year_art = prev_year_art[1] )%>%
  filter(year %in% seq(1999,2017, by = 1))%>%
  ggplot(aes(x = year, y = yearly_incidents, fill = prev_year_art))+geom_bar(stat = "identity")+
  labs(y = "Yearly School Shootings", x = "Year", title = "Media Coverage Lag Effect")+
  scale_fill_viridis(option = "D", name = "Articles in previous year")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5))

YRBSS Survey

Next we look at student responses from the CDC YRBSS surveys. We plot the data (not imputed) using the aggregated variables we defined previously (4 categories: bullying, physical altercations, school safety, suicide) to see trends across time and states. We can see that these scores seem to stay relatively consistent across time with only some minor changes; validation our choice to impute values with the mean of adjacent years and using linear trajectory after 2015. There seems to be more variability across states (see boxplots). It is important to note that this suggests that our choice to impute data for unobserved states with the median of all other states’ data (by year) will underestimate variance. This will be important to recall when interpreting the model results for these variables.

# retrieve origiial data without imputed data
survey_combined_raw = bind_cols(suicide,safety, fight, bull)%>%
  select(year, state, score, n, score1, n1, score2, n2, score3, n3)%>%
  rename(suicide_score = score, school_safety_score = score1, fight_score = score2, bully_score = score3)%>%
  filter(year>=2007)%>%
  full_join(st_yr%>%filter(year>=2007))%>%
  arrange(year)%>%
  arrange(state)%>%
  mutate(state = ifelse(state == "AZB","AZ",state))

# cleaner facet labels
surv.labs = c("Bullying Score", "Physical Altercation Score", "School Danger Perception Score", "Suicidal Thoughts or Actions Score")
names(surv.labs)= c("bully_score", "fight_score", "school_safety_score","suicide_score")

# plot survey responses over time 
survey_combined_raw%>%
  gather(key = "score_type", value = "percent_yes", c(3,5,7,9))%>%
  ggplot(aes(x = year, y = percent_yes, color = score_type))+
  geom_point(position = "jitter")+
  geom_smooth(se=F)+
  scale_color_viridis_d(name = "Score Type", labels = c("Bullying\n","Involvment in \nPhysical Altercation\n","School Danger\nPerception\n","Suicidal \nThoughts/Actions\n"))+
  xlim(2007,2015)+
  labs(y = "% Responding Yes", x = "Year", title = "Survey Responses", subtitle = "(Aggregated Variables)")+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5, size = 8))

# plot survey responses variation by state
survey_combined_raw%>%
  gather(key = score_type, percent, c(3,5,7,9))%>%
  ggplot(aes(x = state, y = percent, color = score_type ))+
  geom_boxplot()+
  scale_color_viridis_d()+
  labs(title = "Survey Reponses by State", x = "State", y = "Percent Answering 'Yes'")+
  facet_wrap(.~score_type,labeller = labeller(score_type = surv.labs))+
  theme_bw()+
  theme(legend.position = "none",axis.text.x = element_text(angle = 45, hjust = 1, size = 4.5),plot.title = element_text(hjust = 0.5))

Mental Health Data

When we look further into the mental health dataset, we realize there are many issues with these data. For one, the survey years stated on the Mental Health America website for each year’s report sometimes overlap, but do not share the same values. Further, the reported percentages do not always change from year to year. The below plot shows an example of this, using the year prior to the report as the survey year.

mh.data=mh.data%>%clean_names()
ggplot(data=mh.data,mapping=aes(x=year,y=percent,col=state))+
  geom_line()+
  labs(title="Percent of Youth Who Experienced an MDE \nand Did Not Receive Treatment",x="Year",y="Percent")+
  scale_color_discrete(name = "State")+
  theme_bw()+
  theme(legend.text = element_text(size = 5), legend.key.size = unit(.5,"line"), legend.title.align = 0.5)

Note that 2015 and 2016 show no change in values. We also notice that there does not appear to be any pattern in the percent of youth experiencing a major depressive episode and not receiving treatment by state or time. This observation, along with the lack of clarity on the appropriate survey years for the data and that two years’ reports had the same values led us to conclude that this data should not be included in our models.

Model Fitting

We next turn to fitting a model to explain the incidence of school shootings in the United States. We first choose to fit a Poisson generalized linear model (GLM) with school shooting count (by state and year) as the outcome of interest. Poisson GLMs are commonly-used for count data, which is why we began with these models. Our choice was then validated by a number of model diagnostic/goodness-of-fit tests (presented below).

Model Using State Grades

We include the following variables in the model (each varies by time and/or state):

  • State gun policy grade (converted to 4.0 GPA scale)
  • Percent of population living below the poverty line
  • Strength of anti-bullying laws
  • Media coverage in the previous year (number of NYT and WP articles)
  • Time (year)
  • Survey suicide score
  • Survey physical altercations score
  • Survey school safety score (student perception of danger)
  • Survey bullying score (proportion of students experiencing bullying)
  • Population (log scale)

Note: population was not used as a variable but as an offset term in the model (coefficient not estimated but set to be 1). Note we also have one year’s worth of data for the “strength of anti-bullying laws” variable so we use it just as a fixed state effect (doesn’t vary over time). Media data only varies by year, not by state.

More formally, the model we fit is:

\[\log(E[I_{t,s}])=\beta_0+\beta_1\cdot g_{t,s}+\beta_2\cdot p_{t,s}+\beta_3\cdot l_{t,s}+\beta_4\cdot b_{s}+\beta_5\cdot s_{t,s}+\\\beta_6\cdot y_{t,s}+\beta_7\cdot f_{t,s}+ \beta_8\cdot t+ \beta_9 \cdot m_{t-1} +\log( n_{t,s}) \]

where here for each state \(s\) and year \(t\) we have that \(I_{t,s}\) is the school shooting incident count, \(g_{t,s}\) is the grade, \(p_{t,s}\) is the percent of the population living in poverty, \(b_{t,s}\) is the survey-derived bullying score, \(l_{s}\) is the state anti-bullying law score (does not vary by year), \(s_{t,s}\) is the survey-derived suicide score, \(t\) is time (year), \(f_{t,s}\) is the survey-derived physical altercation score, \(y_{t,s}\) is the survey derived school danger score, \(m_t\) is the media score (number of articles) the previous year, and \(n_{t,s}\) is the population.

Model Results

Below we show the model output. One can see that the only significant covariate in this model is GPA (state gun policy grade). The estimate is negative, as we would hope, indicating that higher GPA (stricter gun laws) is associated with fewer school shootings.

# fit model written above
fit_grade = glm(incident_count~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)), family = "poisson", data = ss_counts)

# table with model output
kbl = broom::tidy(fit_grade)
kbl$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl, align = "c" , digits = 4, caption = "GLM Output Using State Policy Grades as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Policy Grades as Covariates
term estimate std.error statistic p.value
Intercept -73.6027 117.6988 -0.6253 0.5317
GPA -0.3189 0.0707 -4.5135 0.0000
Poverty -0.0396 0.0305 -1.2967 0.1947
Bullying Laws score 0.0841 0.0513 1.6380 0.1014
Articles in previous year 0.0005 0.0003 1.5367 0.1244
Year 0.0283 0.0584 0.4846 0.6280
Suicide score 3.4774 4.8977 0.7100 0.4777
Fight score -3.5257 2.6116 -1.3500 0.1770
School safety score -3.7020 3.9987 -0.9258 0.3546
Bullying score 1.7618 3.4415 0.5119 0.6087
# 1- to show estimated drop
exp_change = 1-round(exp(-.3189), digits = 3)

# C.I.
l = 1-round(exp(-.3189 + 1.96 *.0706502 ), digits = 3)
u = 1-round(exp(-.3189 - 1.96 *.0706502 ), digits = 3)


# change from F to A grade (4 point increase in gpa) (1- to show estimated drop)
f_to_a = 1-round(exp(4*-.3189), digits = 4)

# C.I.
l4 = 1-round(exp(4*-.3189 + 1.96*4*.0706502 ), digits = 3)
u4 = 1-round(exp(4*-.3189 - 1.96*4*.0706502 ), digits = 3)

From this model we predict that with a one unit change in GPA, a state would expect to see a 27.3% reduction (95% C.I. (16.5%,36.7%)) in school shootings. This model also implies that increasing GPA by 4 (going from an F to an A) the expected drop in school shootings is 72.1% (95% C.I. (51.4%,84%)). For context, the average number of school shootings per year and state in the data is 0.516.

Model Checking

To check the model we run a number of diagnostic tests. First we perform a deviance test and get a p-value of \(p=0.838\) against our null hypothesis that the model fits well (no evidence against model). We also test for any overdispersion and get a p-value of \(p = 0.945\) against the null that there is no overdispersion (no evidence against the null) with an estimated dispersion parameter of -0.1. Lastly we plot diagnostic plots and see that the model fits relatively well.

#check model fit using deviance
p_fit_grade = 1-pchisq( fit_grade$deviance, df = fit_grade$df.residual)

# check for overdispersion
disp_test = AER::dispersiontest(fit_grade,trafo=1)
p_disp = disp_test$p.value
disp_est = disp_test$estimate

# diagnostic plot
plot(DHARMa::simulateResiduals(fit_grade))

GEE

Finally, we wish to see if there is variability between states that we did not already account for in the variables included in our previous model that could be effecting school shooting counts. To do this we apply Generalized Estimating Equations (GEE) taking state as the grouping factor. We use an autoregressive correlation structure within states to take into account the fading lag effects across years (with years closer in time assumed to be more highly correlated than distant years). We can see below that even when taking state variation into account, GPA still has a significant effect in reducing school shootings. The results are very similar to our previous GLM results.

# fit gee
gee_dat_grade = ss_counts%>%
  select(c(1,2,3,20,22,23,25,26,28,30,32,34))

gee_grade = geeglm(incident_count~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score,family=poisson("log") ,offset = log(population),id=as.factor(state), corstr = "ar1", data=gee_dat_grade[complete.cases(gee_dat_grade),]) 
# n = 300 observations

# table with model output
kbl_gee = broom::tidy(gee_grade)
kbl_gee$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl_gee, align = "c" , digits = 4, caption = "GEE Output Using State Policy Grade as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"),full_width = F)
GEE Output Using State Policy Grade as Covariates
term estimate std.error statistic p.value
Intercept -132.2346 118.1578 1.2525 0.2631
GPA -0.2778 0.1006 7.6285 0.0057
Poverty -0.0270 0.0475 0.3247 0.5688
Bullying Laws score 0.0722 0.0771 0.8787 0.3486
Articles in previous year 0.0003 0.0003 0.6110 0.4344
Year 0.0572 0.0585 0.9566 0.3281
Suicide score 1.9226 4.8621 0.1564 0.6925
Fight score -1.4986 3.5232 0.1809 0.6706
School safety score -2.5324 3.3665 0.5659 0.4519
Bullying score 4.4356 4.2274 1.1009 0.2941

Below we plot the predicted number of school shootings per year (summed over all states) based on our model fit compared to the actual data. We also plot (by color) the predicted number of school shootings based on the model if all states had gun laws of equal strength, ranging from all A’s to all F’s. We predict using the actual data for all other covariates and imposing, for each letter grade, that all states have the same gun policy grade to see the effect that these changes would have in our data. As we can see below, our model fits fairly well when comparing predicted to actual values. Unsurprisingly, we see that with each increase in letter grade, the estimated annual school shooting count increases.

# plot predicted number of counts for different grades
pred_a = predict(fit_grade, ss_counts%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%mutate(gpa = 4.0), type = "response")
pred_b = predict(fit_grade, ss_counts%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%mutate(gpa = 3.0), type = "response")
pred_c = predict(fit_grade, ss_counts%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%mutate(gpa = 2.0), type = "response")
pred_d = predict(fit_grade, ss_counts%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%mutate(gpa = 1.0), type = "response")
pred_f = predict(fit_grade, ss_counts%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%mutate(gpa = 0), type = "response")



i = data.frame(model_fit = fit_grade$fitted.values, 
               year = ss_counts%>%ungroup()%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%select(year), 
  state = ss_counts%>%ungroup()%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%select(state), 
  ss_counts%>%ungroup()%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%select(incident_count), 
  grade = ss_counts%>%ungroup()%>%filter(!is.na(grade)&!is.na(prev_year_art))%>%select(grade),
  A = pred_a,
B = pred_b,
C = pred_c,
D = pred_d,
`F` = pred_f
)


newdat = i%>%
  gather(key = new_grade, value = pred, c(6:10))%>%
  group_by(year, new_grade)%>%
  summarise(inc = sum(incident_count),
            pred_inc=sum(pred))


i%>%
  mutate(grade = substr(grade,1,1))%>%
  group_by(year)%>%
  summarise(inc = sum(incident_count),
            pred_inc = sum(model_fit))%>%
  ggplot(aes(x = year))+
  geom_line(data = newdat, aes(x = year, y = pred_inc, color = new_grade))+
  labs(x = "Year", y = "Number of School Shootings", title = "Number of School Shootings")+
  scale_color_viridis_d(direction = -1, name = "Hypothetical \nGrade")+
  geom_line(aes(y = inc, linetype = "Actual Events"), color = "grey47")+
  geom_line(aes(y = pred_inc , linetype = "Model Predicted \nValues"), color = "grey47")+
  scale_linetype_discrete(name = NULL)+
  theme_bw()+
  theme(plot.title = element_text(hjust = 0.5))

Model Using State Laws

Because we found a statistically significant effect in the model using state gun policy grade, we wish to look further into this trend by looking specifically at which types of laws might be driving the differences in school shooting rates. Thus, we substitute the “grade” variable in the above model with the combined law count variables described in an earlier section (4 categories : gun access laws, bans, screening laws, and transport laws). Recall that these variables are the difference in the sum of restrictive laws in effect minus the sum of permissive laws in effect for a given state and year (policies beginning or ending in the middle of the year are rounded to the nearest year). Thus, the model is the same except we now split the variable accounting for state gun policy into four more specific variables (one for each of the law type categories) and estimate a separate coefficient for each. That is,

\[\log(E[I_{t,s}])=\beta_0+\beta_1 \cdot p_{t,s}+\beta_2\cdot l_{t,s}+\beta_3\cdot b_{s}+\beta_4\cdot s_{t,s}+\beta_5\cdot y_{t,s}+\beta_6\cdot f_{t,s}+ \beta_7\cdot t+ \beta_8 \cdot m_{t-1} +\\ \beta_9 \cdot al_{t,s}+\beta_{10} \cdot bl_{t,s}+\beta_{11} \cdot sl_{t,s} +\beta_{12} \cdot sl_{t,s}+ \log( n_{t,s}) \]

where here all variables are the same as above with the addition of \(al_{t,s}\), \(bl_{t,s}\), \(sl_{t,s}\), \(tl_{t,s}\), which represent the access laws, bans, screening laws, and transport laws, respectively, for state \(s\) in year \(t\).

Model Results

Below we see that from this model, year, media, suicide score, fight score, access laws, and bans all have significant effect at an \(\alpha\) level of 0.05. There is also some evidence that transport laws have an effect on school shooting incidence; though, interestingly this slope is positive indicating more transport laws are associated with higher rates of school shootings.

# modeel fit with law counts
fit_law = glm(reformulate(names(ss_counts)[c(2,23,25,26,28,30,32,34,36:39)],names(ss_counts)[3]), offset=log(population), family = "poisson", data = ss_counts)

# table with output
kbl2 = broom::tidy(fit_law)
kbl2$term = c("Intercept","Year","Articles in previous year","Poverty", "Bullying Laws score",  "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl2, align = "c" , digits = 4, caption = "GLM Output using State Laws as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output using State Laws as Covariates
term estimate std.error statistic p.value
Intercept 110.0419 60.4020 1.8218 0.0685
Year -0.0626 0.0301 -2.0814 0.0374
Articles in previous year 0.0009 0.0003 3.0520 0.0023
Poverty -0.0344 0.0226 -1.5207 0.1283
Bullying Laws score 0.0616 0.0422 1.4616 0.1439
Suicide score 10.6968 4.4033 2.4292 0.0151
School safety score -5.6336 3.8883 -1.4489 0.1474
Fight score -6.6711 2.1468 -3.1075 0.0019
Bullying score -3.3383 2.8957 -1.1528 0.2490
Access Laws -0.0666 0.0274 -2.4311 0.0151
Transport Laws 0.0857 0.0520 1.6475 0.0995
Screening Laws 0.0024 0.0292 0.0832 0.9337
Bans -0.2604 0.1226 -2.1230 0.0338

Our model estimates that every addition of a restrictive (or removal of permissive) access law is associated with a 6.4% (95% C.I. (1.3%,11.3%)) drop in school shootings and, likewise, every addition of a ban is associated with a 22.9% decrease in school shootings (95% C.I. (2%,39.4%)).

# predict on raw scale for variables of interest

# bans estimate and conf int
bans_est = paste( round(1-exp(-0.2604), digits = 4)*-100, "%", sep = "")
bans_ci_l = paste( round(1-exp(-0.2604- 1.96*0.1226),digits = 4)*-100, "%", sep = "")
bans_ci_u = paste( round(1-exp(-0.2604+ 1.96*0.1226),digits = 4)*-100, "%", sep = "")

# access laws estimate and conf int
access_est = paste( round(1-exp(-0.0666), digits = 4)*-100, "%", sep = "")
access_ci_l = paste( round(1-exp(-0.0666    -1.96*0.0274), digits = 4)*-100, "%", sep = "")
access_ci_u = paste( round(1-exp(-0.0666    +1.96*0.0274), digits = 4)*-100, "%", sep = "")

# table of estimates on raw scale
kableExtra::kable(data.frame(Variable = c("Access Laws","Bans"), Estimate = c(access_est,bans_est), `C.I Lower`= c(access_ci_l,bans_ci_l),`C.I Upper`= c(access_ci_u,bans_ci_u)  ), digits = 3, caption = "Estimated Change in School Shooting Victims per Unit Increase in Law Type", align = "c")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Estimated Change in School Shooting Victims per Unit Increase in Law Type
Variable Estimate C.I.Lower C.I.Upper
Access Laws -6.44% -11.33% -1.28%
Bans -22.93% -39.39% -1.99%

Model Checking

Again, we check our model fit using a variety of methods. For our deviance test we get no significant evidence against our model (\(p = 0.9593\)). We also do not get evidence of overdispesion (\(p = 0.3599\)) with an estimated dispersion parameter of 0.0181. The diagnostic plots show an even better fit than our first model (red lines look more horizontal than in previous model).

#check model fit using deviance
p_fit_law = 1-pchisq( fit_law$deviance, df = fit_law$df.residual)

# check for overdispersion
disp_test_law = AER::dispersiontest(fit_law,trafo=1)
p_disp_law = disp_test_law$p.value
disp_est_law = disp_test_law$estimate

# diagnostic plot
plot(DHARMa::simulateResiduals(fit_law))

# compare to zero-inflated model
z_fit = pscl::zeroinfl(reformulate(names(ss_counts)[c(2,23,25,26,28,30,32,34,36:39)],names(ss_counts)[3]), offset=log(population), dist  = "poisson", data = ss_counts)
## simlilar results

GEE

We, again, use GEE to check for unaccounted-for variability between states. Interestingly, for this set of covariates, the model output does not match our previous output with respect to significance (direction of estimated coefficients still similar). Here none of the types of law show a significant effect on school shootings with p-values all greater than 0.4. We believe one explanation of this is that because these laws do not change drastically from year to year, they may be absorbed into the state effect and hence appear insignificant after allowing for state variation. This does not necessarily mean that these laws do not play a role in school shootings. Rather, this could just be a consequence of the laws’ lack of variation across time, making their effects difficult to distinguish from other within-state variability in the data we have.

# dropped media data because otherwise there wasn't enough data
gee_dat_law = ss_counts%>%
  select(c(1,2,3,20,25,26,28,30,32,34,36:39))

gee_law = geeglm(incident_count ~ year  + poverty + suicide_score + school_safety_score + fight_score + bully_score + access_laws + transport + screening + ban,family=poisson("log"), offset = log(population),id=as.factor(state), corstr = "ar1",
                data=gee_dat_law[complete.cases(gee_dat_law),])
# n = 550 observations

# table with output
kbl2_gee = broom::tidy(gee_law)
kbl2_gee$term = c("Intercept","Year","Poverty", "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl2_gee, align = "c" , digits = 4,caption = "GEE Output using State Laws as Covariates")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
GEE Output using State Laws as Covariates
term estimate std.error statistic p.value
Intercept -109.1746 44.7266 5.9582 0.0146
Year 0.0468 0.0222 4.4672 0.0346
Poverty -0.0623 0.0347 3.2199 0.0727
Suicide score 4.8523 3.4369 1.9932 0.1580
School safety score -0.2579 2.6207 0.0097 0.9216
Fight score -4.9753 2.6294 3.5804 0.0585
Bullying score 1.3050 3.3212 0.1544 0.6944
Access Laws -0.0345 0.0492 0.4929 0.4826
Transport Laws 0.0641 0.0772 0.6885 0.4067
Screening Laws -0.0249 0.0453 0.3028 0.5821
Bans -0.0647 0.2027 0.1020 0.7494

Secondary Analysis

The analysis above uses number of school shootings as its outcome, but does not account for the severity of the events in terms of number of victims/fatalities. Instead, it treats all shooting events equally. Next, we perform a secondary analysis using the number of victims from all school shootings in a given state and year as the outcome of interest (total wounded or killed including shooter), rather than just incident count. For any state and year combination with no school shootings recorded, we assign the value 0 for number of victims. Hence, this analysis will treat shootings with no victims (no persons wounded or killed) the same as days in which no shootings occur. We do this so that the analysis can be interpreted as number of victims of school shootings (injuries/fatalities) that could be avoided with different gun policies.

Model Results

We fit negative binomial models using the same predictors as the earlier analysis, first using state gun policy grades (on 4.0 GPA scale) and next using the variables corresponding to the four types of laws with number of victims per year in each state as the outcome. The rest of the covariates are the same as above. We choose to fit negative binomial models in this case because Poisson models fit poorly (p value of 0 and estimated dispersion parameter of 2.39).

Model using grades:

The first model we fit uses state gun policy grades on a 4.0 GPA scale as well as the other variables mentioned above as predictors to model number of victims from school shootings. We see from the model that for every one unit change in GPA we estimate a 19.1% drop (95% C.I. (-34.88%,0.54%) percent change in victim count) in number of victims (per state in a given year). We check our model fit with a deviance test and do not get evidence against our model (p = 0.993).

# why we didnt fit poisson models
pois_fit = glm(total_victims~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)),  data = ss_counts%>% mutate(total_victims = ifelse(is.na(total_victims),0,total_victims)), family = "poisson")

#estimate of dispersion
disp = pois_fit$deviance/pois_fit$df.residual
#2.39

# fit model written above
fit_grade_victims = MASS::glm.nb(total_victims~gpa+poverty+bullying_score+ prev_year_art+year+suicide_score+fight_score+school_safety_score+bully_score+ offset(log(population)),  data = ss_counts%>%
  mutate(total_victims = ifelse(is.na(total_victims),0,total_victims)))

# table with model output
kbl_vict = broom::tidy(fit_grade_victims)
kbl_vict$term = c("Intercept","GPA", "Poverty", "Bullying Laws score", "Articles in previous year", "Year","Suicide score", "Fight score" , "School safety score", "Bullying score")
kableExtra::kable( kbl_vict, align = "c" , digits = 4, caption = "GLM Output Using State Policy Grades as Covariates and Number of Victims as the Outcome")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Policy Grades as Covariates and Number of Victims as the Outcome
term estimate std.error statistic p.value
Intercept 24.4992 178.8202 0.1370 0.8910
GPA -0.2118 0.1108 -1.9113 0.0560
Poverty -0.0625 0.0419 -1.4910 0.1360
Bullying Laws score 0.1116 0.0644 1.7330 0.0831
Articles in previous year 0.0003 0.0005 0.5941 0.5524
Year -0.0207 0.0889 -0.2325 0.8161
Suicide score 11.2271 6.7357 1.6668 0.0956
Fight score 5.8992 3.2758 1.8008 0.0717
School safety score -16.5938 6.7962 -2.4416 0.0146
Bullying score 0.9127 5.5661 0.1640 0.8698
# check model fit
check_fit = 1-pchisq(fit_grade_victims$deviance, df = fit_grade_victims$df.residual)
# estimate 
# access laws estimate and conf int
gpa_est = paste( round(1-exp(-0.2118), digits = 4)*-100, "%", sep = "")
gpa_ci_l = paste( round(1-exp(-0.2118 - 1.96*   0.1108), digits = 4)*-100, "%", sep = "")
gpa_ci_u = paste( round(1-exp(-0.2118 + 1.96*   0.1108), digits = 4)*-100, "%", sep = "")

Model using laws:

We next fit a similar model, this time replacing GPA with counts for each of the four categories of laws in effect in each state per year (restrictive laws - permissive laws). We choose to fit a negative binomial model once again for the same reasons as above.

# fit model
fit_law_victims = MASS::glm.nb(total_victims ~ year+prev_year_art + poverty + bullying_score + suicide_score + school_safety_score + fight_score + bully_score + access_laws + 
    transport + screening + ban+ offset(log(population)), data = ss_counts%>%
  mutate(total_victims = ifelse(is.na(total_victims),0,total_victims))) # need to add in zero for rows with no shootings

# table with model output
kbl3 = broom::tidy(fit_law_victims)
kbl3$term = c("Intercept","Year","Articles in previous year","Poverty", "Bullying Laws score",  "Suicide score",  "School safety score", "Fight score" ,"Bullying score", "Access Laws", "Transport Laws", "Screening Laws", "Bans")
kableExtra::kable( kbl3, align = "c" , digits = 4, caption = "GLM Output Using State Laws as Covariates and Number of Victims as the Outcome")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
GLM Output Using State Laws as Covariates and Number of Victims as the Outcome
term estimate std.error statistic p.value
Intercept -22.8540 91.2353 -0.2505 0.8022
Year 0.0031 0.0454 0.0687 0.9452
Articles in previous year 0.0005 0.0004 1.2542 0.2098
Poverty -0.0654 0.0324 -2.0201 0.0434
Bullying Laws score 0.0894 0.0502 1.7805 0.0750
Suicide score 16.1287 6.2710 2.5720 0.0101
School safety score -16.1897 6.0430 -2.6791 0.0074
Fight score 2.7016 2.6361 1.0249 0.3054
Bullying score -1.7542 4.3870 -0.3999 0.6893
Access Laws -0.1089 0.0378 -2.8825 0.0039
Transport Laws 0.2066 0.0872 2.3691 0.0178
Screening Laws 0.0722 0.0369 1.9573 0.0503
Bans -0.4477 0.1708 -2.6215 0.0088

Here we see that there is evidence of effect for all four types of gun laws. Below we show the estimated change in number of victims from school shootings per unit change in each type of gun law (negative values correspond to decreases in victim counts). For context, the average response value in the data set is 0.8564 victims per state and year. Interestingly, in this analysis screening and transport laws both have positive coefficients, indicating that stricter laws are associated with more fatalities/injuries from school shootings. While the effect of screening laws is relatively small, the transport law effect is sizable. Further analysis may want to look at these laws in further detail to understand this trend.

# predict on raw scale for variables of interest
# include estimated change and conf int

# bans
bans_est = paste( round(1-exp(-0.4477), digits = 4)*-100, "%", sep = "")
bans_ci_l = paste( round(1-exp(-0.4477  -1.96*0.1708), digits = 4)*-100, "%", sep = "")
bans_ci_u = paste( round(1-exp(-0.4477  +1.96*0.1708), digits = 4)*-100, "%", sep = "")

# access laws
access_est = paste( round(1-exp(-0.1089), digits = 4)*-100, "%", sep = "")
access_ci_l = paste( round(1-exp(-0.1089-1.96*  0.0378), digits = 4)*-100, "%", sep = "")
access_ci_u = paste( round(1-exp(-0.1089+1.96*  0.0378), digits = 4)*-100, "%", sep = "")

#transport laws
trans_est = paste("+", round( 1-exp(0.2066), digits = 4)*-100, "%", sep = "")
trans_ci_l = paste( round( 1-exp(0.2066-1.96*   0.0872), digits = 4)*-100, "%", sep = "")
trans_ci_u = paste( round( 1-exp(0.2066+1.96*   0.0872), digits = 4)*-100, "%", sep = "")

#screening laws
screen_est = paste("+", round(1-exp(0.0722), digits = 4)*-100, "%", sep = "")
screen_ci_l = paste( round(1-exp(0.0722-1.96*   0.0369), digits = 4)*-100, "%", sep = "")
screen_ci_u = paste( round(1-exp(0.0722+1.96*   0.0369), digits = 4)*-100, "%", sep = "")

# print table
kableExtra::kable(data.frame(Variable = c("Access Laws","Bans","Transport Laws", "Screening Laws"), Estimate = c(access_est,bans_est, trans_est, screen_est), `C.I Lower`=c(access_ci_l,bans_ci_l, trans_ci_l,screen_ci_l) ,`C.I Upper`=c(access_ci_u,bans_ci_u, trans_ci_u, screen_ci_u)), caption = "Estimated Change in School Shooting Victims per Unit Increase in Law Type", align = "c")%>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Estimated Change in School Shooting Victims per Unit Increase in Law Type
Variable Estimate C.I.Lower C.I.Upper
Access Laws -10.32% -16.72% -3.42%
Bans -36.09% -54.27% -10.68%
Transport Laws +22.95% 3.63% 45.87%
Screening Laws +7.49% -0.01% 15.55%

Conclusions

Recommendation

The analysis showed explicit evidences that the states with more prohibitive gun laws policies had lesser number of school shooting experiences and thus leads us to suggest improving gun policies and imposing more restrictive gun laws may help to reduce the number of school shootings. The analysis of individual types of gun laws reveals that controlling the accessibility of guns and banning firearms may improve the prevalent school shooting rates. Based on the other findings of our primary analysis we would also suggest to pay more attention to the media coverage of the school shooting incidences. We would recommend media to not broadcast the name and details of the shooters rather focus more on the victims as often cases potential attackers may be motivated by the publicity. Youth mental health should given more priority and states can offer different behavioral interventions in schools to promote mental health.

Limitations

The gun law scores were assigned to each state based on the how restrictive and effective gun policies and laws the state implements, however there was not been much details provided in the website about the mechanism that was applied in the grading scores. The website does not clearly state which are the laws they have considered while grading the states and also how much weight each gun laws were assigned while calculating the score. Again the relative difference between two grades such as B to A and F to E, was been considered as equal change as in both cases the more gun policies or laws has been introduced and grades have improved by one scale but it ignores the fact that the state moving from F to E might had to struggle more to improve the situation than the state which already had a better judicial law about gun policies.

Similarly when we were investigating the effects of different gun laws separately, we assumed that the all the gun laws hold similar importance and we had assigned equal weights to all the laws. For all the different types of all, for each cases when there was a permissive law change we assigned -1 and for each restive law change we assigned +1. Again, as mentioned at the data section some of the laws were changed in the middle of the year and we decided to round the value to the nearest year which may cause some loss of information. In the dataset we have considered the law changes that took place after the year 2010. However, there can be some changes in the gun laws prior to 2010 which may still have some impact on the preceding years states gun policies and may have a confounding effect on our analysis and it is not possible to adjust for it. We are also aware of the fact that even if the laws were changed at a certain time point, it takes time for the changes to make a significant impact on the overall gun prevalence and gun violence thus it is difficult to quantify the effect of gun policies on school shootings.

It is ascertained that the state with higher population would have more number of schools and also might have more school shooting incidences and we wished to adjust for the population. Though we could have adjust the model only for the school going population but the dataset shows that there was a large number of shooters with age more than 18 years. Hence we decided to take into account the overall population of each state for each year instead of just the school going population.

As the literature view suggested we wished to adjust for the prevalence of mental heath illness in each state over the study period and collected data on the prevalent rate of youths who were suffering from major depression but did not receive any mental health treatment. However the exploratory data analysis revealed that there were extreme inconsistency in the data. The documentation clearly did not specify the study years and though the report 2018 and 2019 mentioned that they have used survey data from the year 2014-2015, the values were different. Because of the reports did not explicitly defined the survey years we decided not to include the data in our analysis. However, if the dataset was persistent, it would have added more information to our analysis.

Future Work

We selected the prevalent poverty rate as an indicator of the financial stability and economic condition of each state. As the literature review implies, we could have collected unemployment rate for each state and for each year but since the majority of the school shooters were students, it is coherent to assume that whether the students are coming from financially stable families or not would impact more on school shootings than employment rate. However, for future analysis we could account for unemployment rate, income per household and other economic variables to make the study more exhaustive.

Bullying at school is often considered as one of the crucial components of school shooting incidences and thus we wished to account for the variation in the existing bullying laws in different states. However the most credible dataset that was available for each state contained data only for the year 2010. So we included the variable to adjust for the variation in state laws assuming that the states maintained the same level of rigidity towards the bullying laws throughout the study period. The analysis would have been more thorough if we had data for all the years.

References

  1. Walker C. (2019, July) 10 years. 180 school shootings. 356 victims. Retrieved from: https://www.cnn.com/interactive/2019/07/us/ten-years-of-school-shootings-trnd/

  2. Federal Bureau of Investigation. A Study of Active Shooter Incidents in the United States Between 2000 and 2013. 2014; Available at www.fbi.gov/news/stories/2014/september (accessed January 30, 2015).

  3. Knowles H (2019, November 15) 16-year-old shooter at California high school has died, authorities say. Retrieved from:https://www.washingtonpost.com/nation/2019/11/15/year-old-santa-clarita-high-school-shooting-suspect-has-died-authorities-say/

  4. Towers, S., Gomez-Lievano, A., Khan, M., Mubayi, A., & Castillo-Chavez, C. (2015). Contagion in mass killings and school shootings. PLoS one, 10(7), e0117259.

  5. John Woodrow Cox, Steven Rich, Allyson Chiu, John Muyskens and Monica Ulmanu (Updated: 2019, November 14). More than 233,000 students have experienced gun violence at school since Columbine. Retrieved from: https://www.washingtonpost.com/graphics/2018/local/school-shootings-database/

  6. Breslau, N., Kessler, R. C., Chilcoat, H. D., Schultz, L. R., Davis, G. C., & Andreski, P. (1998). Trauma and posttraumatic stress disorder in the community: the 1996 Detroit Area Survey of Trauma. Archives of general psychiatry, 55(7), 626-632.

  7. Lowe, S. R., Blachman-Forshay, J., & Koenen, K. C. (2015). Epidemiology of trauma and trauma-related disorders: trauma as a public health issue. Evidence-based treatments for trauma-related psychological disorders: A practical guide for clinicians, 11-40.

  8. US Dept of Education. Final report and findings of the safe school initiative: Implications for the prevention of school attacks in the United States, 2004.

  9. Richardson EG, Hemenway D. Homicide, suicide, and unintentional firearm fatality: comparing the United States with other high-income countries, 2003. The Journal of Trauma and Acute Care Surgery. 2011; 70: 238–243.

  10. Hemenway D, Miller M. Firearm availability and homicide rates across 26 high-income countries. Journal of Trauma and Acute Care Surgery. 2000; 49(6): 985–988.

  11. Krug, E. G., Mercy, J. A., Dahlberg, L. L., & Powell, K. E. (1998). Firearm-and non-firearm-related homicide among children: an international comparison. Homicide Studies, 2(1), 83-95.

  12. CIA Factbook. The World Factbook; 2010. See also: http://www.ciagov/library/publications/the-world-factbook, accessed January 30, 2015.

  13. Metzl, J. M., & MacLeish, K. T. (2015). Mental illness, mass shootings, and the politics of American firearms. American journal of public health, 105(2), 240-249.

  14. Vossekuil, B., Fein, R. A., Reddy, M., Borum, R., & Modzeleski, W. (2002). The final report and findings of the safe school initiative: Implications for the prevention of school attacks in the United States. Washington, DC: U.S. Secret Service and U.S. Department of Education.

  15. Lee, J. H. (2013). School shootings in the U.S. public schools: Analysis through the eyes of an educator. Review of Higher Education and Self-Learning, 6, 88–120.

  16. Pah, A. R., Hagan, J., Jennings, A. L., Jain, A., Albrecht, K., Hockenberry, A. J., & Amaral, L. A. N. (2017). Economic insecurity and the rise in gun violence at US schools. Nature Human Behaviour, 1(2), 0040.

  17. Jetter, M., & Walker, J. (2018). The Effect of Media Coverage on Mass Shootings.

  18. Christensen, J. (2017). Why the US has the most mass shootings. CNN, published August 27, 2015.